CONTENT  ANALYSIS  AND  THE  ORGANIZATION 
OF  COMIAT  INTELLIGENCE  DATA 


Murray  S.  Miron  and  Samual  M.  Patten 
Syraeuie  University  Research  Corporation 

and 

Stanley  M.  Halpin 

Army  Research  Institute  for  the  Behavioral  and  Social  Sciences 


BATTLEFIELD  INFORMATION  SYSTEMS  TECHNICAL  AREA 


D  D  C 

fnVZiSEIME 


DEC  11  I9T9 


lEasEinns 


U.  S.  Army 

Research  Institute  for  the  Behavioral  and  Social  Sciences 


PgIRIBtmON  STATEMEMT  A 

Aw*ro»*d  foi  public  xelcoac; 
Piitilbution  Unlimited 


February  1978 


Number 


Intelligence  Systems 


^search 


^NTENT  ^ALYSIS  AND  THE  ORGANIZATION  OP 
^  ttJMBAT  INTELLIGENCE  ^TA  * 


Stanley  M.  Halpin 

Army  Research  Institute  for  the  Behavioral  and  Social  Sciences 


Submitted  by: 

Edgar  N.  Johnson,  Chief 

Battlefield  Information  Systems  Technical  Area 


vrn  oMkx 

EDO  ris 

VbMDOunotd 

JUstlfioatlon 


178  J 


(21^ 


rX-RN\ 


-7^-7 


Approved  by: 


I  Distribution/ 

Avnllobillty  Code; 

Avail  and /ox* 
iDlst  npocial 


Joseph  Zeldncr,  Director 
Organizations  and  Systems  Research 
Laboratory 


I  I  I  J.  E.  Uhlaner,  Technical  Director 

I _  -I  U.S.  Army  Research  Institute  for  the 

Behavioral  and  Social  Sciences 

Research  Memorandums  are  informal  reports  on  technical  research 
problems.  Limited  distribution  is  made,  primarily  to  personnel  engaged 
in  research  for  the  Army  Research  Institute. 


V  y  /N  o  y 


CONTENT  ANALYSIS  AND  THE  ORGANIZATION  OF 
COMBAT  INTELLIGENCE  DATA 


INTRODUCTION 

'  / 

In  order  to  expeditiously  and  accurately  process  intelligence  infor¬ 
mation,  it  is  essential  that  Incoming  information  be  systematically  organ¬ 
ized.  Such  organization  provide-  a  means  for  categorizing,  differentia¬ 
ting  and  integrating  intelligence  for  retrieval,  evaluation,  and  inter¬ 
pretation.  This  report  will  examine  and  test  an  application  of  a  high¬ 
speed  data  processing  technique  which  is  designed  to  automatically  provide 
organizational  structure  for  incoming  intelligence.  The  procedure  involves 
the  use  of  a  system  of  computer  routines  known  as  the  General  Inquirer, 
which  was  developed  for  the  analysis  of  message  contents  The  routines, 
originally  devised  by  a  team  of  researchers  at  Harvard  l^versity  (see 
Stone  et  al.,  1962)  hav.2  been  modified  and  upgraded  exten^vely  by  J. 

Philip  Miller  of  St.  Louis  University  (Miller  and  Psathas,  1968)  and  to 
a  lesser  extent  by  the  senior  author. 

The  approach  to  the  organization  of  intelligence  information  repre¬ 
sented  by  these  computer  procedures  involves  the  automatic  identification 
and  catalogirig  of  a  set  of  previously  selected  word  and  phrase  forms  in 
the  text  of  the  Intelligence  reports  received  from  the  field.  Critical 
word  occurrences  in  the  messages  are  organized  into  a  set  of  concept 
categories  flexibly  defined  by  the  intelligence  analyst.  This  flexibility, 
which  is  essential  if  the  system  is  to  be  responsive  to  the  specifics  of 
either  particular  situations  or  the  particular  needs  of  any  given  analyst, 
is  achieved  through  the  use  of  a  series  of  user-oriented  progreuns  which 
employ  a  syntax  readily  mastered  by  the  analyst  for  specifying  identifi¬ 
cation  and  retrieval  operations.  Thus,  although  the  programs  are  inter¬ 
nally  quite  coR^lex,  from  the  standpoint  of  the  analyst-user  they  are 
single  to  use,  allowing  for  a  wide  range  of  options  which  place  all  of  the 
computational  burden  on  the  computer. 

At  the  time  that  the  General  Inquirer  was  developed,  most  computer 
techniques  were  such  that  nur'ieric  processing  was  comparatively  easy  and 
text  processing  difficult.  The  General  Inquirer  was  a  much-needed  tool 
in  fields  that  dealt  with  textual  data.  The  original  authors  described 
it  as: 


..;a  set  of  conputer  programs  to  (a)  identify  system¬ 
atically,  within  text,  instances  of  words  and  phrases 
that  belong  to  categories  specified  by  the  Investigator; 
(b)  count  occurrences  and  specify  co-occurrences  of 
these  categories;  (c)  print  and  graph  tad)ulations ; 

(d)  perform  statistical  tests;  and  (e)  sort  and 
regroup  sentences  according  to  whether  they  contain 
Instances  of  a  particular  category  or  combination  of 
categories  (Stone  et  al.,  1966,  p.  68). 


In  content  analysis,  the  General  Inquirer  functions  as  a  well-trained 
clerk  who  assigns  particular  categories  (specified  by  the  investigator 
before  the  analysis)  to  words  and/or  combinations  of  words.  The  IBM  S/360- 
S/370  versions  of  the  General  Inquirer,  called  the  Inquirer  II  (abbreviated 
I/II)  still  contain  this  capability  and  also  allow  for  more  elaborate 
analyses  of  the  data.  The  Inquirer  II  programs  are  able  to  make  more 
elaborate  searches  of  the  textual  data  and  provide  more  options  to  the 
potential  user. 

Content  analysis  may  be  defined  as  a  data  organization  technique 
which  involves  a  systematic  identification  of  theoretically  relevant 
categories  in  textual  data.  As  employed  in  this  project,  the  procedure 
provides  a  method  for  deriving  a  taxonomy  of  intelligence  reports.  Cate¬ 
gories  of  report  content  and  the  rules  by  which  they  may  be  identified 
are  defined  by  means  of  a  dictionary.  For  the  purposes  of  this  project, 
a  special  purpose  dictionary  was  constructed  from  analysis  of  intelligence 
reports  taken  from  the  intelligence  journals  of  the  2Bth  Infantry  Division 
for  the  10-15  December  1944  period  just  before  the  German  Ardennes 
counteroffensive  known  as  the  Battle  of  the  Bulge.  This  dictionary  pro¬ 
vides  a  taxonomy  of  the  content  of  the  military  messages;  the  procedure 
described  below  then  organizes,  integrates  and  classifies  the  messages 
on  the  basis  of  their  content  similarities  and  differences.  Order  is 
imposed  on  the  otherwise'  unorganized  reports  through  the  identification  of 
dictionary-defined  concepts  in  the  reports.  Reports  which  polythetically 
s'.iare  the  greatest  numbers  of  concept  occurrences  will  be  considered 
"similar,"  and  separated  from  those  not  sharing  such  occurrences  of 
concepts . 

Such  an  approach  to  the  organization  of  intelligence  data  allows 
the  individual  analyst  to  flexibly  define  his  own  categories  of  message 
content  and  structure.  Additionally,  it  allows  for  continuous  updating 
and  modification  of  the  organizational  schema  as  the  situation  requires. 

In  a  field  application  of  such  a  system,  the  sequentially  received  reports 
would  be  entered  into  the  computer  according  to  the  conventions  outlined 
below  and  successive  factor  structures  would  be  computed  for  the  body  of 
reports  forming  the  data  base  at  any  given  time.  As  each  successive  report 
is  added  to  the  data  base,  or  at  any  other  appropriate  time,  e.g.,  at  the 
end  of  the  day,  a  new  factor  structure  and  organization  would  be  computed. 

This  report  presents  a  factor  structure  on  the  basis  of  an  analysis 
of  40  reports  ovei  six  days;  this  represents  the  covergent  outcome  of  what 
would  have  been  a  succession  of  structures  confuted  after  six  days  of 
reports  had  been  received.  As  these  Intermediate  structures  converge, 
many  of  the  category  variables  would  not  be  employed.  Estimates  of  the 
efficiency  of  the  categories  and  these  intermediary  structures  may  be  made. 
Category  definitions  in  the  dictionary  may  be  modified  and  updated  by  tlie 
analyst,  as  required,  to  produce  an  organizational  structure  which  he 
deems  to  be  meaningful  and  which  at  the  siune  time  efficiently,  economi¬ 
cally,  and  successfully  tags  the  content  of  the  reports.  For  such  pur¬ 
poses,  the  analyst  would  inspect  the  untagged  word  file  to  ascertain  which 


items  should  be  added  to  the  dictionary  and  then  would  produce  a  new 
structure.  Over  a  period  of  time,  each  analyst  or  installation  would 
thus  build  a  dictionary  of  intelligence  concepts  which  would  be  uniquely 
suited  to  the  type  of  material  and  situations  being  analyzed. 


METHOD 

The  general  procedure  for  the  analysis  was:  (1)  a  sample  of  40 
intelligence  reports  was  keypunched  into  IBM  cards  according  to  a  sot  of 
conventions,  (2)  a  dictionary  of  critical  concepts  was  constructed,  (3) 
rules  for  identification  of  these  concepts  as  they  occurred  in  the  text 
were  developed,  (4)  tabulations  of  the  occurrences  of  the  identified 
concepts  were  calculated  for  each  message,  and  (5)  correlations  and  factor 
analyses  were  computed  using  these  tabulations  of  identified  concepts. 

Full  details  of  the  syntax  and  computer  routines  employed  are  in  the 
Appendix. 

The  intelligence  reports  (see  Miron,  Patten  and  Halpin,  1977)  and 
the  specially  constructed  dictionary  served  as  inputs  to  the  computer 
system.  The  program  which  assigned  the  catego:.ies  (tagging  program)  read 
in  the  data  a  sentence  at  a  time,  then  located  each  word  in  the  dictionary. 
Instructions  were  given  by  the  dictionary  as  to  what  category  should  be 
assigned  and/or  what  searches  of  the  context  in  which  the  word  occurred 
should  be  made.  The  instructions  were  then  executed.  When  the  analysis 
of  a  sentence  was  completed  (i.e.,  all  the  categories  to  be  assigned  had 
been  fssigned  and  all  searches  had  been  completed) ,  the  tagging  program 
wrote  out  that  sentence  and  read  in  the  next.  The  process  continued  a 
sentence  at  a  time  until  all  the  reports  had  been  tagged.  The  output 
from  the  tagging  program  was  a  tagged  file  which  was  stored  so  that  re¬ 
trieval,  tabulation,  and  statistical  analyses  of  the  data  could  be  made. 


DATA  INPUT  AND  FORMAT 

Thirty-three  intelligence  reports  actually  received  by  the  28th 
Infantry  Division,  and  seven  false  reports  designed  to  test  the  capabili¬ 
ties  of  the  system,  served  as  the  data  base  in  the  present  study.  These 
intelligence  messages  range  in  content  from  trival  sighting  reports  of 
horsedrawn  vehicles  to  G2  summaries  of  considerable  tactical  and  strategic 
importance.  The  procedures  outlined  here  may  be  followed  with  any  sample 
of  messages,  without  restriction  as  to  the  type  or  source  of  the  message. 
However,  it  is  expected  that  more  meaningful  rep>ort  taxonomies  will  be 
obtained  if  each  message  covers  a  relatively  limited  scope  of  information. 

The  report  data  input  is  prepared  as  continuous  text  as  if  it  were 
being  typed.  Each  different  character  of  the  input  text  is  assigned  a 
function.  For  example,  an  ali^anumeric  character  (A-Z,  a-z,  0-9)  is 
considered  part  of  a  word  and  a  blank  indicates  the  end  of  a  word.  The 
period  (.),  exclamation  point  (I),  and  question  mark  (?)  indicate  the  end 


3 


of  a  sentence.  Braces  ((  1  ).  greater  than  and  less  than  signs  (>  <),  and 
the  dollar  sign  ($)  indicate  message  identif ication«  titles,  and  comments 
which  are  not  to  be  searched  for  in  the  dictionary  and  not  given  a  content 
category . 


DICTIONARY  PREPARATION 

The  major  task  in  using  the  Inquirer  II  system  is  the  creation  of  a 
dictionary.  A  content  analysis  category  (called  a  concept  in  the  Inquirer 
systems)  consists  of  a  number  of  language  signs  (such  as  words,  idioms, 
and  phrases)  which  together  represent  a  variable  in  the  investigator's 
theory.  For  example,  the  analyst  concerned  about  the  movements  of  a 
particular  enemy  division  might  be  interested  in  identifying  the  number 
of  references  to  that  division  in  reports  already  received  and  therefore 
constructs  a  category;  e.g.,  "the  26th  Volksgrenadier  Division,"  composed 
of  references  to  that  division  (Volksgrenadier,  VG,  26tli  VG  Division,  77th 
VG  Regiment,  78th  VG  Regiment,  etc.).  The  basic  procedure  in  content 
analysis  is  to  identify  (tag)  these  signs  when  and  if  they  occur  in  t)-4e 
text  as  instances  of  a  particular  concept,  and  score  t);em  as  such. 

The  analyst  would  seldom  carry  out  a  content  analysis  with  a  single 
concept.  Instead,  he  is  usually  interested  in  examining  relations  of  a 
number  of  semantic  categories  as  they  appear  in  intelligence  documents. 
Therefore,  we  use  a  cluster  of  concepts,  referred  to  as  a  content  analy¬ 
sis  dictionary.  For  the  Inquirer  system  the  exposition  of  this  dictionary 
is  in  a  special  language.  Dictionary  Definition  language  (DDL) .  Details 
of  the  syntax  of  this  language  are  in  the  Appendix. 

Category  Construction.  The  first  task  in  dictionary  construction  is 
to  define  the  categories  or  concepts  which  were  to  be  identified  in  the 
reports  under  consideration.  The  listing  of  these  categories,  as  well 
as  preliminary  conceptual  definitions  of  each  of  the  concepts,  forms  an 
important  nucleus  for  the  actual  construction  of  the  dictionary.  From 
this  definition  of  the  concepts  and  their  interrelationships,  the  Concept 
Name  Paragraph  (CNP)  is  constructed.  The  Inquirer  System  allows  an 
analyst  the  flexibility  of  assigning  several  different  types  of  inter¬ 
relations  between  concepts.  The  first  of  these  indicates  a  one-to-one 
relation  between  the  concept  and  the  tags.  In  the  Inquirer  System  con¬ 
cepts  have  names  and  tags  have  numbers.  The  concepts  are  used  in  con¬ 
structing  the  entry  used  in  the  dictionary  and  at  post-processing  time  for 
tabulations  and  listings;  during  the  tagging  and  searching,  it  is  the 
tag  numbers  which  are  used. 

Table  1  displays  the  Concept  Name  Paragraph  (CNP)  or  outline  struc¬ 
ture  of  the  military  dictionary  devised  for  this  project.  These  concepts, 
in  their  subordinate  levels  of  organisation,  are  indicated  by  successive 
subdividing  strings  of  number  identifiers.  Thus,  in  Table  1,  AREA  OP 
OPERATIONS  (1)  is  subdivided  into  locations  (1,2),  terrain  (1,17),  and 


4 


urban  (1,18)  with  subcategories  for  each,  e.g.,  as  in  this  case,  coordi¬ 
nates  (1,2,3)  as  a  subdivision  of  locations  (1,2)  and  even  further  sub¬ 
division  of  the  coordinates  into  sectors  along  the  forward  edge  of  the 
battle  area.  The  methodology  allows  the  analyst  to  add  to  or  subtract 
from  the  dictionary,  to  reorganize  categories,  or  to  change  the  entries. 

The  concept  categories  used  in  the  present  study  evolved  from  our 
earlier  attempts  to  produce  a  subjective  taxonomy  of  intelligence  infor¬ 
mation  (see  Patten,  1974  and  from  a  detailed  examination  of  the  Key  Word 
in  Context  (KWIC)  output  listings  of  the  reports  themselves  (Figure  1). 

No  brief  is  made  that  this  conceptual  structure  is  either  definitive  or 
exhaustive;  the  dictionary  is  presented  singly  as  a  part  of  this  demon¬ 
stration  of  the  methodological  approach.  However,  as  will  be  seen,  the 
en^irical  test  of  this  dictionary  does  produce  a  practical  taxonomy  of 
the  reports  on  which  it  was  tested. 

Entry  Selections.  After  the  preliminary  definition  of  the  concepts 
and  their  interrelations  have  been  completed,  the  next  task  is  to  deter¬ 
mine  what  entries  are  to  be  in  the  dictionary.  IVo  separate  philosophies 
appear  at  this  point.  One  is  that  nearly  every  word  in  the  intelligence 
reports  to  be  analyzed  should  be  in  the  dictionary.  This  philosophy  of 
an  exhaustive  dictionary  has  certain  methodological  attractions,  chief 
aunong  which  are  that  the  analyst  has  considered  every  possible  word  and 
account  of  those  words  which  have  not  been  found  or  tagged  in  the  dictionary. 
This  provides  some  measure  of  adequacy  of  the  dictionary.  Previous  research 
has  tended  to  show  that  the  dictionary  in  the  3,000  to  4,000  word  range 
will  tag  somewhere  between  90  and  98%  of  ordinary  texts.  The  other  philosophy 
is  that  of  a  selective  dictionary  in^  which  only  words  which  are  relevant 
to  the  concepts  at  hand  are  included.  In  most  cases  with  the  selective 
dictionary,  it  is  possible  to  determine  exhaustive  lists  of  words  which 
are  to  be  assigned  a  given  concept.  This  is  the  approach  which  has  been 
taken  in  this  project.  For  exan^le,  the  concept  COORDINATES  exhaustively 
categorizes  the  locus  of  any  action  in  the  AREA  OF  OPERATIONS.  Similarly, 
ORGANIZATION  includes  all  organizational  sub-divisions  encountered  in  the 
message  sample. 

The  dictionary  underwent  a  number  of  revisions  before  reaching  its 
final  form.  These  revisions  were  made  after  Inspecting  the  listings  of 
un tagged  words  and  the  KWIC  outputs.  In  addition,  a  logical  taxonomy  of 
Intelligence  information  was  constructed  according  to  a  set  of  subjective 
procedures  (Patten,  1974).  In  brief,  the  dictionary  construction  pro¬ 
cedures  seek  to  group  significant  word  occurrences  in  the  report  seunple 
xinder  a  set  of  major  and  minor  category  divisions  of  roughly  equal  scope, 
nie  total  process  is  selective  in  that  it  uses  only  those  items  which 
suggest  logically  derived  categories. 

Entry  Name  Paragraph  (ENP) .  Once  the  list  of  potential  entry  words 
is  constructed,  the  relation  between  entry  words  and  the  concepts  is 
specified.  For  this  purpose,  an  exhaustive  listing  of  the  vocabulary  of 
the  reports  may  be  made  by  means  of  the  Key  Word  in  Context  (KWIC)  routine 
(Figure  1) .  Where  more  than  one  word  sense  appears  in  a  portion  of  text 


Table  1 


Concept  Name  Paragraph  (CNP)  of  the  Military  Dlctirnary 

AREA  OF  OPERATIONS  -  1. 
locations  ■>  1,  2. 

coordinates  ■  1,  2,  3. 
north  west  ■  1,  2,  3,  A. 
north  east  -  1,  2,  3,  5. 
north  central  west  -  1,  2,  3,  6. 
north  central  east  •  1,  2,  3,  7. 
south  central  west  ••  1,  2,  3,  8. 
south  central  east  -  1,  2,  3»  9. 
south  west  ■  1,  2,  3,  10. 
south  east  -  1,  2,  3,  11. 
sone  -  1,  2,  12. 
zone  north  ■  1,  2,  12,  13. 
tone  north  central  ■  1,  2,  12,  14. 
tone  south  central  ■  1,  2,  12,  15. 
tone  south  -  1,  2,  12,  16. 
terrain  -  1,  17. 
urban  ■*  1,  18. 

BRANCH  -  19. 

amor  •  19,  20. 
artillery  ■  19,  21. 

Infantry  ■  19,  22. 

CHANGE  -  23. 

decrease  23,  24. 
increase  ■  23,  25. 

DIRECTION  -26. 

north  ■  26,  27. 
east  26,  28. 
south  "  26,  29. 
vest  ■  26,  30 

EXTENT  -  31. 

large  ■  31,  32. 
snail  -  31,  33. 

INTELLIGENCE  -  34. 

cognition  "  34,  35. 
reconnaissance  *  34,  36. 

IjOGISTICS  -  37. 

MOVEMENT  -  38. 

ORGANIZATION  -  39. 
amy  ■  39,  40. 
amy  air  corps  ■  39,  41. 
corps  -  39,  42. 
division  -  39,  43. 
regiment  ■  39,  44. 
battalion  *  39,  45. 
company  ■  39,  46. 
headquarters  "39,  47. 
team  ■  39,  48. 


6 


Table  1  (Cont'd) 

Concept  Name  Paragraph  (CNP)  of  the  Military  Dictionary 

ORGANIZATION  STATUS  -  49. 
friendly  -  49.  50. 
enemy  *49,  51 

PERSONNEL  -  52. 

civilian  -  52,  53. 
military  ■  52.  54. 

deserters  -  52.  54,  55. 
enlisted  -  52.  54,  56. 
officers  -  52,  54,  57. 

POWs  -  52.  54.  58. 

PLANES  -  59. 

bombers  ■59.  60. 
observation  ■  59.  61. 

SENSOR  -  62. 

auditory  ■  62.  63. 
visual  ■  62.  64. 

TACTICS  -  65. 

defense  ■65.  66. 
firing  ■65.  67. 
flares  ■  65.  68. 
offense  ■  65,  69. 
patrols  ■  65.  70. 

TIME  -  71. 

morning  ■  71,  72. 
afternoon  ■  71.  73. 
evening  ■  71,  74. 

TRANSPORTATION  -  75. 
aerial  ■  75.  76. 
surface  ■  75,  77. 
vehicles  ■  75,  77,  78. 
trains  ■  75,  77,  79. 
water  ■  75.  80. 

WEAPONS  -  81. 

artillery  piece  ■  81.  82. 
machine  gun  ■  81,  83. 
mortars  ■  81.  84. 
small  arms  81.  85. 
tanks  ■  81,  86. 


7 


e 


Figure  1.  IlltMtration  of  Key  Word  in  Context  (KWIC)  Output  Listing 


and  the  distinction  between  the  word  senses  is  deemed  in^rtant,  condi¬ 
tional  rules  are  constructed  to  distinguish  between  the  various  word 
senses.  The  KWIC  is  extremely  useful  in  determining  what  rules  will  work 
for  this  differentiation,  because  it  provides  a  listing  of  every  unique 
word  along  with  both  the  textual  context  in  which  the  word  appears  as 
well  as  the  message  identification. 

Table  2,  below,  gives  the  ENP  for  the  military  dictionary  in  this 
study.  It  will  be  observed  that  many  of  the  entries  are  conditional  in 
form  in  order  to  differentiate  concepts.  These  conditional  entries  take 
the  form  of  IF  statements  which  indicate  a  search  of  the  text  for  the 
specified  word  contexts  forward  (-f)  or  backward  (-)  from  the  dictionary 
entry  word.  Ihus,  for  example,  the  entry  word  AIR,  in  Ted>le  2,  indicates 
that  a  conditional  search  is  to  be  made  to  determine  whether  the  one 
following  vrord  fulfills  the  stated  conditions.  The  two  numbers  (1,1) 
signify  that  the  search  begins  and  ends  one  word  to  the  right  (forward) 
of  AIR.  If  the  word  following  AIR  is  FORCE,  then  AIR  is  classified  as 
an  instance  of  the  concept  Army  Air  Corps.  Note  that  the  textual  or 
entry  words  of  the  dictionary  appear  to  the  left  of  the  colon  ( : )  and 
the  concepts  to  which  they  are  to  be  assigned  are  to  the  right  of  the 
colon  ( : ) .  For  further  explanation  of  the  syntax  of  these  entries  see 
the  Appendix  at  the  end  of  this  report. 


TAGGED  OUTPUT  LISTING 

The  initial  output  of  the  Inquirer  tagging  program  is  the  original 
data  plus  the  categories  that  have  been  assigned  and  stored  for  future 
use  on  some  output  medium  specified  by  the  analyst  (e.g.,  tape,  disk,  or 
drum).  If  the  analyst  chooses,  the  output  from  category  assignment  may 
be  listed  so  that  the  text  may  be  inspected  to  see  how  well  the  category 
"fits”  the  data. 

Those  words  which  did  not  receive  any  categorization  are  underlined 
in  the  listing  so  that  the  user  knows  which  characteristics  of  the  data 
were  not  handled  by  any  of  the  dictionary  routines.  Moreover,  after  having 
inspected  the  listing  of  the  output,  the  analyst  may  resubmit  the  original 
output  for  re-tagging  by  the  same  (usually  updated)  dictionary  or  by 
additional  dictionaries. 

Figure  2  provides  an  exan^le  of  listed  tagged  output. 


TABULATIONS 

Data  that  have  been  tagged  are  tabulated  according  to  specifications 
provided  by  the  user.  The  tabulation  lists  all  the  categories  in  iidiich 
the  analyst  is  interested  and  provides  the  raw  frequency  of  occurrence 
for  each  concept  in  the  tabulated  text.  The  tabulate  program  also  provides 
the  total  number  of  units  in  the  document  such  as  words,  sentences  or 


9 


Table  2 


Entry 

ACTIVITY 

AFTERN0(IN 

AGGRESSIVE 

AIR 


ALMUroUN 

AMERICAN 

APPEAR 

AREA 

ARM0R 

ARMS 

ARMY 


ARRIVE 

ARTILLERY 


ATTACK 

BANK 

BATTALI0N 


BAULER 

BEF0RE 

BEHIND 

BELIEVE 

BEND 

BERG 

BETTINGEN 

BIESD0RF 

BIT 


Name  Paragraph  (ENP)  of  the  Military  Dictionary 
: MOVEMENT. 

;T1ME. 

; TACTICS. 

;IF  W0RD(1.1)  -  ’F^RCE* 

THEN  ARMY  AIR  C0RPS 
ELSE  IF  W0RD(-1»>1)  -  ‘02' 

THEN  (0FFICERS,  FRIENDLY) 

ELSE. 

: URBAN. 

: FRIENDLY. 

: VISUAL. 

:I4CATI^. 

: BRANCH. 

:IF  W0RD  (-1,-1)  -  'SMALL*  THEN  SMALL  ARMS  ELSE  WEAP0NS. 
:ARMY: 

IF  W0RD  (-1,-1)  -  'ISTH* 

THEN  ENEMY 
ELSE  FRIENDLY. 

:H0VEMENT. 

: ARTILLERY; 

IF  W0RD  (1,1)  -  'PIECE* 

THEN  (ARTILLERY  PIECE;  FRIENDLY;  EXIT) 

ELSE  IF  W0RD  (-2,-1)  -  '299TH’| 

W0RD  (-1,-1)  -  *C0RPS'| 

W0RD  (-2,-1)  -  *28TH' 

THEN  FRIENDLY 
ELSE  ENEMY. 

:0FFENSE. 

: TERRAIN. 

:BATTALI0N; 

IF  W0RD  (1,1)  -  '295TH' 

W0RD  (1,1)  -  '316TH' 

W0RD  (-1,-1)  -  'GUN* 

W0RD  (1,1)  -  '915TH* 

THEN  ENEMY 
ELSE  FRIENDLY. 

:URBAN. 

:TIME. 

:L0CATI0N. 

:C0GNITI0N. 

: TERRAIN. 

: URBAN. 

:  URBAN. 

: URBAN. 

;IF  W0RD  (-1,-1)  -  'A*  4 
H0RD  (-2,-2)  -  'QUITE* 

THEN  LARGE 
ELSE. 


10 


Table  2  (Cont'd) 


Entry  Name  Paragraph  (ENP)  of  the  Military  Dictionary 


BRANDSCHEID 

: URBAN. 

BRIDGE 

: TERRAIN. 

BUILDUP 

:0FFENSE. 

BULLET 

: FIRING. 

BURST 

.'FIRING. 

CAR 

:IF  W0RD  (-1,-1)  - 
THEN  VEHICLES 
ELSE  IF  W0RD  (- 
THEN  (L0GIST 
ELSE. 

CARL0AOS 

: TRAINS. 

CARRY 

:M0VEMENT. 

CENTER 

:L0CATI0N. 

CHANGE 

: CHANGE. 

CIVILIAIi 

: CIVILIAN. 

C0LLIDE 

:M0VEMENT. 

C0L0GNE-B0NN-DUEREN 

: URBAN. 

C0MPANY 

:C0MPANY;  ENEMY. 

C0NFIRM 

:C0GNITI0N. 

C0NSIDERABLE 

: LARGE. 

C0NV0Y 

: VEHICLES. 

C00RDINATES 

:C00RDINATES. 

C0RFS 

: FRIENDLY:  C0RPS. 

CP 

: HEADQUARTERS. 

CR0SS 

:M0VEMENT. 

CURRENT 

: HATER. 

DARK 

:  EVENING. 

DARKNESS 

: EVENING. 

DASBURG 

: URBAN. 

DAWN 

:M0RNING. 

DAY 

:M0RNING;  AFTERN00N. 

DEFENSE 

: DEFENSE. 

DESERTER 

: DESERTERS. 

DETECT 

:SENS0R. 

DETERMINE 

:C0GNITI0N. 

DIRECTI0N 

:DIRECTI0N. 

DISPLAYED 

: VISUAL. 

DISP0SITI0N 

:L0CATI0N. 

DIVISI0N 

:DIVISI0N. 

IF  W0RD  (-1,-1)  -  ' 
W0RD  (-1,-1)  -  ' 
W0RD  (-1,-1)  -  ' 
W0RD  (-2,-2)  -  ' 
S0RD  (-4,-1)  -  ' 
W0RD  (-4,-4)  »  ' 
THEN  ENEMY 

ELSE  FRIENDLY. 

D0UBLETIME 

:M0VEMENT. 

'STAFF' 


'FREIGHT' 


V0LKSGRENADIER' 
DAM7irBl  I 


ECHEL0N' 
GERMAN*  I 
320TH' 


n 


Table  2  (Cont'd) 

Entry  Name  Paragraph  (ENP)  of  the  Military  Dictionary 


DURING 

tTIME. 

EARLY 

:TIME. 

EAST 

:EAST. 

EASTWARD 

:EAST. 

ECHEL0N 

: ENEMY;  ORGANIZATION. 

ECHTERNACH 

:  URBAN. 

ENEMY 

:  ENEMY. 

ENTIRE 

: LARGE. 

EQUIPMENT 

:LOGISTICS. 

ESCAPE 

: MOVEMENT. 

ESCHFELD 

: URBAN. 

ESTIMATE 

: COGNITION. 

EVENING 

.'EVENING. 

EVIDENCE 

:C0GNITI0N. 

EXPLOSIVES 

; LOGISTICS. 

EXTEND 

: EXTENT. 

FIRE 

:FIRING. 

FLARE 

: FLARES. 

FLEW 

! MOVEMENT. 

F0LL0W 

:IF  WORD  (1.1)  -  ’THE' 

THEN  MOVEMENT 

ELSE. 

F00T 

;IF  WORD  (0,1)  -  'TRAFFIC' 
THEN  (SURFACE;  PERSONNEL) 
ELSE. 

F0RCE 

: ORGANIZATION; 

IF  WORD  (-1,-1)  -  'ENEMY' 
THEN  ENEMY 

ELSE  FRIENDLY. 

FORMATION 

;  Organization. 

FORMER 

.‘TIME. 

PORTRESS 

: DEFENSE. 

FRONT 

: LOCATION. 

G2 

INTELLIGENCE;  FRIENDLY. 

GEICHLINGEN 

: URBAN. 

GEMUEND 

: URBAN. 

GENERATORS 

: LOGISTICS. 

GERMAN 

: ENEMY. 

GUARD 

: ENLISTED. 

GUN 

;IFWORD  (-3,-1)  -  ’MACHINE' 
THEN  MACHINE  GUN 

ELSE  ARTILLERY. 

HARASSING 

1  SMALL. 

HEADED 

ZMOVEMENT. 

HEADQUARTERS 

!  HEADQUARTERS. 

HEARD 

: AUDITORY. 

HEAVY 

{LARGE. 

HEINERSCHEID 

{URBAN. 

12 


Entry 

HILL 

H0URS 

INCREASE 

INDICATE 

INFANTRY 

INFILTRATION 

JUNCTION 

KALB0RN 

LARGE 

LAUNCH 

LED 

LEFT 

LIELER 

LIGHT 

LINE 

LISTENING 

LITTLE 

L0ADS 

L0CATI0N 

LUXEMBOURG 

MAJOR 

MAN 

MANY 

MAP 

MARKED 

MARSHALLING 

MEN 

MESSAGE 

MILITARY 

MINES 

MINUTES 

MISSING 


M0RNING 

M0RTAR 

M0SELLE 

HOST 

MOTOR 

MOTORCYCLES 

MOVE 

MOVEMENT 

MUENCHEN-GLADBAD 

NATIONAL 


NEAR 

NIEDERGECKLER 


Tabic  2  (Cont'd) 

Name  Paragraph  (ENP)  of  the  Military  Dictionary 

! TERRAIN. 

:TIME. 

INCREASE. 

: COGNITION. 

! INFANTRY. 

:0FFENSE. 

: TERRAIN. 

: URBAN. 

: LARGE. 

; MOVEMENT. 

: MOVEMENT. 

:M0VEMENT. 

:URBAN. 

•SMALL. 

: LOCATION. 

: AUDITORY. 

: SMALL. 

: TRANSPORTATION. 

: LOCATION. 

: URBAN. 

:URGE. 

: PERSONNEL. 

: LARGE. 

INTELLIGENCE. 

: LARGE. 

: LOGISTICS. 

: PERSONNEL. 

INTELLIGENCE. 

: MILITARY. 

: DEFENSE. 

:TIME. 

:IFW0RD  (-1,-1)  -  'RECONNAISSANCE' 

THEN  EXIT 
ELSE  TACTICS. 
iMORNING. 

: MORTARS. 

: TERRAIN. 

: LARGE. 

: VEHICLES. 

: VEHICLES. 

:  MOVEMENT. 

: MOVEMENT. 

: URBAN. 

:IF  WORD  (-1,-1)  -  'GERMAN' 

THEN  (CIVILIAN;  ENEMY) 

ELSE  PERSONNEL. 

: LOCATION. 

: URBAN. 


13 


Entry 

NIEDERSGEGEN 

NIGHT 

N0RTII 

NORTHERN 

N0TES 

N0U 

0BSERVE 

0BSERVERS 

0CCASIONAL 

0FFENSIVE 

0FF1CER 


0LD 

0PP0S1TE 

0RDER 

0RM0NT 

0UR 

0VERC0ATS 

0VERHEARD 

PANZER 

PAST 

PATCHES 

FATR0L 

PERS0NNEL 

P1LLB0X 

PIL0TS 

P1ST0L 

PLANES 


P0INTS 


P0LICE 

P0NT00NS 

P0ST 


P0W 

PR0JECTILES 

PULL 

QUIET 

RADI0S 


Tsblc  2  (rent'd) 

Nane  Paragraph  (ENP)  of  the  Military  Dictionary 

:URBAN. 

: EVENING. 

:N0R'nt. 

:N0R1H. 

:C(k:NITI0N. 

:TIME. 

: VISUAL. 

: VISUAL. 

: SMALL. 

:0FFENSE. 

:IF  W0RD  (-1,-1)  -  'N0NC0MMISSI0NED' 

THEN  ENLISTED 
ELSE  0FFICERS. 

:TIME. 

:L0CATI0N. 

: TACTICS. 

: URBAN. 

: TERRAIN. 

:L0GISTICS. 

;AUDIT0RY. 

:ARM0R. 

:TIME. 

:L0GISTICS. 

:PATR0LS. 

:PERS0NNEL. 

: DEFENSE. 

:0FFICERS:  FRIENDLY, 
t SMALL  ARMS. 

!IF  W0RD  (-1,-1)  -  '0BSERVATI0N' 

THEN  (0BSERVATION;  FRIENDLY) 

ELSE  ENEMY. 

:IF  W0RD  (1,A)  -  'PR0NT' 

THEN  L0CATI0N 
ELSE  C0GNITI0N. 

: CIVILIAN. 

IWATER. 

:IF  W0RD  (-1,-1)  -  '0BSERVATI0N' 

THEN  VISUAL 

ELSE  IF  W0RD  (-1,-1)  -  "C0MMAND' 

THEN  HEADQUARTERS 
ELSE  IF  W0RD  (-1,-1)  -  'LISTENING' 
THEN  AUDIT0RY 
ELSE  L0CATI0N. 

tP0WS. 

: FIRING. 
tM0VEMENT. 

: SMALL. 

: TACTICS. 


Table  2  (Cont'd) 


Entry  Name  Paragraph  (ENP)  of  the  Military  Dictionary 


RAID 

:0FFENSE. 

RAILR0AD 

: TRAINS. 

RATE 

: EXTENT. 

REAR 

:IF  W0RD  (1,3)  -  'AREA' 
THEN  L0CATI0N 

ELSE  0RGANIZATI0N. 

REC0NNA1SSANCE 

:REC0NNAISSANCE. 

REFLECT0RS 

.•TACTICS. 

REGIMENT 

: REGIMENT; 

IF  W0RD  (-2,-1)  - 

'295TH 

W0RD  (-1,-1)  - 

'320TH 

W0RD  (-2,-1)  - 

'352D' 

W0RD  (-2,-1)  - 

'353D' 

W0RD  (-2,-1)  - 

'78TH' 

W0RD  (-1,-1)  - 

'316TH 

W0RD  (-1,-1)  - 

'915TH 

W0RD  (-1,-1)  - 

'423D' 

W0RD  (-2,-1)  - 

'942D' 

W0RD  (-1,-1)  - 
THEN  ENEMY 

ELSE  FRIENDLY. 

'THEIR 

RELATIVES 

: CIVILIAN. 

RELIEVING 

: TACTICS. 

RESERVE 

:IF  W0RD  (-1,-1)  -  ' 
THEN  TACTICS 

ELSE  L0GISTICS. 

IN' 

RHINE 

: TERRAIN. 

RIFLE 

:  SMALL  .VRMS. 

RIVER 

: WATER. 

R0AD 

: TERRAIN. 

R0CKET 

: ARTILLERY  PIECE. 

R0SCHEID 

: URBAN. 

R0TH 

R0TTERDAM 

: URBAN. 

R0UNDS 

:L0GISTICS. 

RUINS 

: TERRAIN. 

RUM0R 

INTELLIGENCE. 

RUNDSTEDT 

:0FFICERS. 

SAARBRUECKEN 

: URBAN. 

SALUTING 

:M0VEMENT. 

SAW 

: VISUAL. 

SCATTERED 

: SMALL. 

SCHEID 

: URBAN. 

SE 

:L0CATI0N 

SEARCHLIGHT 

: TACTICS. 

SECRET 

: INTELLIGENCE. 

SECT0R 

:L0CATI0N. 

SEE 

: VISUAL. 

15 


Table  2  (Cont'd) 


Entry 

SEEMS 

SEVERAL 

SH0RTAGE 

SIDE 

SIGHTED 

SIGNIFICANT 

SINGLE 

SL0PE 

SMALL 

S0LDIER 

S0ME 

S00N 

S0UND 

S0UTH 

S0UTHEAST 

S0UIHWARD 

S0UTHWEST 

SS 

STRASB0URG 

STR0NGLY 

SUGGEST 

SUMMARY 

SUSPECTS 

SYSTEM 

TACTICS 

TANK 

TEAM 

THINLY 

TIGER 


TIME 

TINTESMUEHLE 

T0WARDS 

T0WN 

TRACKS 

TRAFFIC 

TRAIN 

TRIER 

TR00PS 

TRUCKS 

UNIF0RMS 

UNIT 

UNL0ADING 

UNUSUAL 

UPC0MING 

VALLEY 

VEHICLE 


Name  Paragraph  (ENP)  of  the  Military  Dictionary 

:C0GNITI0N. 

:LARGE. 

: SMALL. 

:L0CATI0N. 

: VISUAL. 

:LARGE. 

: SMALL. 

: TERRAIN. 

: SMALL. 

: ENLISTED. 

: SMALL. 

:TIME. 

:AUDIT0RY. 

:S0UTH. 

:S0UTH. 

:S0UTH. 

:S0U1H. 

:  ENEMY. 

: URBAN. 

.•LARGE. 

:C0GNITI0N. 

INTELLIGENCE. 

:C0GNITION. 

:IF  W0RD  (-1,-1)  -  ’RAILROAD'  THEN  TRAINS  ELSE. 

! TACTICS;  0FFENSE. 

:ARM0R. 

:TEAM. 

: SMALL. 

:IF  W0RD  (-1,1)  -  'TANKS' 

THEN  (TANKS;  ENEMY) 

ELSE. 

.•TIME. 

: URBAN 
!OIRECTI0N. 

:URBAN. 

: SURFACE. 

:TRANSP0RTATI0N 
: TRAINS. 

:URBAN. 

:IF  W0RD  (-A,+0)  'INFANTRY'  THEN  FRIENDLY  ELSE  ENEMY. 
: VEHICLES. 

:L0GISTICS. 

:0RGANIZATION;  ENEMY. 

:M0VEMENT. 

:C0GNITI0N. 

:TIME. 

: TERRAIN. 

: VEHICLES. 


Table  2  (Cont'd) 

Entry  Name  Paragraph  (ENP)  of  the  Military  Dictionary 


VEHICULAR 

: VEHICLES. 

VICINITY 

:L0CATI0N. 

VILLAGE 

:URBAN. 

VISUAL 

: VISUAL. 

V0LUME 

: EXTENT. 

WADED 

: MOVEMENT. 

WALK 

:M0VEMENT. 

WAR 

:IF  W0RD  (-3.-1) 
THEN  P0WS 
ELSE. 

WATCH 

: INTELLIGENCE. 

WATER 

:WATER. 

WAXWEILLER 

: URBAN. 

WEAPON 

: ARTILLERY. 

WEEK 

:TIHE. 

WEST 

:WEST. 

WESTERN 

:WEST. 

WHERE 

:L0CATI0N. 

WITHDRAWAL 

: MOVEMENT. 

W0MAN 

: CIVILIAN. 

YARD 

:IF  W0RD  (-1,-1) 
THEN  TRANSP0 
ELSE. 

ZWEIBRUECKEN 

: URBAN. 

0405 

:M0RNING. 

0450 

:M0RNING. 

0540 

:M0RN1NG. 

0600 

:M0RNING. 

0630 

!M0RNING. 

0745 

:N0RN1NG. 

0800 

:M0RNING. 

0830 

:M0RMING. 

0900 

:H0RNING. 

0905 

:M0RN1NG. 

0910 

:M0RNING. 

1000 

:M0RN1NG. 

1015 

:M0RNING. 

1019 

:M0RNING. 

io4o 

:M0RNING. 

1100 

:N0RNING. 

Ills 

:N0RNING. 

1130 

:H0RNING. 

1200 

:M0RNING. 

1300 

tAFTERN00N. 

1400 

:AFTERN00N. 

1500 

tAFTERN00N. 

17 


Table  2  (Cont'd) 


Entry  Name  Paragraph  (ENP)  of  the  Military  Dictionary 


1550 

AFTERN00N 

1600 

AFTERN00N 

1800 

AFTr.RN00N 

1810 

AFTERN00N 

1830 

AFTERN00N 

1840 

AFTERN00N 

1930 

AFTERN00N 

1940 

AFTERN00N 

1950 

AFTERN00N 

2000 

AFTERN00N 

2015 

EVENING. 

2100 

EVENING. 

2113 

EVENING. 

2130 

EVENING. 

2200 

EVENING. 

2215 

EVENING. 

2245 

EVENING. 

2400 

EVENING. 

0045 

EVENING. 

0300 

EVENING. 

0400 

EVENING. 

102200 

EVENING. 

132300 

EVENING. 

132345 

EVENING. 

140001 

EVENING. 

142400 

EVENING. 

908750 

tNWl. 

995196 

;NW1. 

040948 

!NE2. 

064963 

;NE2. 

0696 

:NE2. 

0892 

:NE2. 

9881 

:NE2. 

996806 

;NE2. 

821675 

!NCW3. 

838705 

;NCW3. 

840673 

:NCW3. 

847594 

:NCW3. 

851581 

:NCW3. 

854584 

:NCW3. 

871541 

:NCW3. 

875543 

:NCW3. 

0167 

:NCE4. 

1454 

:NCE4. 

850672 

:NCE4. 

850673 

:NCE4. 

854680 

:NCE4. 

871557 

:NCE4. 

18 


Tabic  2  (Cont'd) 

Entry  Name  Paragraph  (ENP)  of  the  Military  Dictionary 


873687 

:NCE4. 

874693 

:NCE4. 

878698 

:NCE4. 

880696 

:NCE4. 

888575 

:NCE4. 

890570 

:NCE4. 

8970 

:NCE4. 

897562 

:NCE4. 

902692 

:NCE4. 

930602 

:NCE4. 

9753 

:NCE4. 

9241 

:SCU5. 

9272 

:SCW5. 

964414 

:SCU5. 

0450 

:SCE6. 

1353 

:SCE6. 

2050 

:SCE6. 

9052 

:SCE6. 

915482 

:SCE6. 

9451 

:SCE6. 

950440 

:SCE6. 

952458 

:SCE6. 

958435 

:SCE6. 

963436 

:SCE6. 

965439 

:SCE6. 

9743 

:SCE6. 

2129 

:SE8. 

15TH 

:Z0NEN. 

V 

:Z0NEN. 

106TH 

:Z0NEN. 

18TH 

:Z0NEN. 

295TH 

:Z0NEN. 

353D 

:Z0NEN. 

VIII 

: Z0NENC. 

26TH 

t  Z0NENC. 

28TH 

: Z0NENC. 

llOTH 

:Z0NENC. 

112TH 

: Z0NENC. 

78TH 

: Z0NENC. 

299TH 

: Z0NENC. 

352  D 

: Z0NBSC. 

9TH 

:Z0NESC. 

GR0SS 

:Z0NESC. 

942D 

:Z0NESC. 

109TH 

:Z0NESC. 

915TH 

t  Z0NESC. 

116TH 

:Z0NES. 

212TH 

:Z0NES. 

19 


Table  2  (Cont'd) 


Entry  Name  Paragraph  (ENP)  of  the  Military  Dictionary 

4TH  :Z0HES. 

316TH  :Z0NES. 

320TH  :Z0NES. 

12TH  :Z0NES. 

42 3D  :Z0NES. 

1ST  ;IF  W0RD  (2,2)  -  'llOTH’ 

THEN  Z0NENC 

ELSE  IF  W0RD  (1,2)  -  '295TH' 

THEN  Z0NEN 
ELSE  Z0NESC. 

2D  !IF  W0RD  (2,2)  -  •295TH' 

THEN  Z0NEN 

ELSE  IF  W0RD  (1,2)  -  •316TH' 

THEN  Z0NES 

ELSE  IF  W0RD  (1,2)  -  '942D' 

THEN  Z0NESC 
ELSE  Z0NENC. 

3D  :IF  W0RD  (2,2)  -  ’109TH' 

THEN  Z0NESC 

ELSE  IF  W0RD  (2,2)  -  ’llOTH’ 

THEN  Z0NENC 
ELSE  Z0NEN. 

REC0RD  C0UNT  -  00000505,  NAME  -  0002 
C0Py  C0MPLETE 


iO  FiFLO  OF  LEVEL  1  -  1012441 


ii:' 

^  ec  c 
S  3  < 


slfis 

2  fi  * 


•  ^8 

ilHl 

22SiS 


m  HI 

f  IIS* 
&siS5 


28 

*  ^  S  5  5 

(9  o  o  o  !t! 
lU  U«  C  C  c 
C  C  o  O  S 


xg  ij 

nm 

•  ^8 

mh 

PPSoiij 

ssiss 


*  *  s  < 

iliil 


hi 

iSfll 

gg  ft 

mi 


« I?: 


1  s 

is!  I  s 


^  B 

I 


!  H  jj  8  i 

I 

ill  i§  i 


.  ss  ;  I 
si 

apis  ii 

oosPiS  S  K  iii 


Ss*' 

iSSSS 

i»8SS 


Figure  2.  Illustration  of  Tagged  output  Listing 


paragraphs.  If  the  analyst  is  interested  in  only  a  few  concepts  and  not 
the  total  number  in  the  dictionary,  he  may  specify  which  concepts  are  to 
be  tabulated  and  which  are  to  be  suppressed.  In  addition,  the  tabulate 
program  provides  index  scores  which  are  obtained  by  the  division  of  the 
various  frequency  scores,  i.e., 

total  assignments  of 
WORD  INDEX  =  a  given  concept  X  100 

total  words  in  the 
entire  document 


ANALYSIS 

The  various  indices,  such  as  the  Word  Index,  provide  the  basis  for 
determining  the  "similarity"  among  the  messages  in  the  data  base.  Several 
alternative  approaches  are  available  for  the  measurement  of  similarity  and 
for  the  determination  of  the  best  grouping  of  messages .  In  the  present 
study  the  Pearson  product-moment  correlation  between  pairs  of  messages, 
con^juted  over  the  8  categories  of  the  content  analysis,  was  used  as  the 
similarity  measure.  The  eigen  vectors  of  the  correlation  matrix  were  then 
obtained  through  a  principal  components  analysis,  and  these  vectors  were 
rotated  to  simple  structure  using  Varimax  criteria.  Other  measures  of 
similarity  and  other  procedures  for  determining  the  underlying  structure 
of  the  messages  can  be  found  in  Johnson  (1967),  Anderberg  (1972),  and 
Sneath  and  Solcal  (1973) . 


RESULTS  AND  DISCUSSION 
KEY  WORD  IN  CONTEXT  (KWIC)  AND  FREQUENCY  COUNT 

As  Indicated,  the  first  step  In  the  data  analysis  was  to  produce  a 
KWIC  and  a  corresponding  frequency  analysis  of  the  word  occurrences  in 
the  total  set  of  40  reports.  Figure  1  above  displays  a  sample  section 
of  the  KWIC  for  these  messages.^ 

The  total  set  of  reports  contained  3214  word  tokens  (total  words) 
and  710  word  types  (a  type  being  defined  as  any  uniquely  spelled  word 
form) .  The  frequency  distribution  is  not  atypical  of  that  found  in  most 
vocabulary  counts.  Approximately  50%  of  all  of  the  word  occurrences  are 
accounted  for  by  the  first  50  most  frequent  words  and  there  are  a  large 
nunber  of  single  occurring  word  types.  The  words  HOURS,  INFANTRY,  ENEMY, 
COORDINATES  and  ARTILLERY  are  among  the  highest  occurring  types. 


1 


The  full  XHIC  and  the  Frequency  Count  are  availed>le  from  ARI. 


22 


4 


DICTIONARY  TAGGING 

Figure  2,  above,  presents  a  saii^le  of  the  tagging  output  produced  by 
the  computer  routines.  The  tagged  words  of  this  display  are  from  the 
first  report  in  the  message  set.^  * 

T^d^le  3  summarizes  the  total  number  of  word  occurrences  relative  to 
the  total  words  of  the  report  (times  a  factor  of  100)  which  were  tagged 
in  each  of  the  reports  for  each  of  the  17  major  category  divisions  of 
the  Concept  Neune  Paragraph  (Tedsle  1) .  These  percentages  correspond  to 
the  Word  Index  described  previously  (see  discussion  of  Tabulations  under 
Methods  eUtxjve) . 

It  is  the  Word  Indices  for  the  full  66  categories  and  subcategories 
of  Table  1  which  will  be  submitted  to  analysis  (see  Table  4  for  the 
Sequence  Numbering  used) . 


REPORT  FACTOR  STRUCTURE 

TeUsle  5  summarizes  the  factor  structure  obtained  from  a  principal 
components  analysis  of  the  report  correlations  based  on  the  full  set  of 
86  categories  of  the  dictionary.  (Three  categories  have  zero  occurrences 
of  identified  words.  These  categories  would  have  yielded  zero  divisor 
chec)cs  in  calculating  the  factor  scores,  and  for  this  reason  they  were 
dropped . ) 

Five  factors  accounting  for  approximately  85%  of  the  total  report 
variance  had  eigen  values  greater  than  1.00  and  they  were  rotated  by 
Varimax  criterion  to  sii^le  structure. 

Factor  I  identifies  reports  of  large  scale  enemy  troop  movements  or 
locations  of  considereible  strategic  importance.  Factor  II  identifies 
reports  of  vehicular  and  small  scale  movement  all  along  the  forward  edge 
of  the  battle  area.  Factor  III  identifies  those  reports  dealing  with 
unusual  small  arms  fire.  Factor  IV  identifies  deserter  and  POW  reports 
of  lesser  reliability  and  Factor  V  identifies  civilian,  prisoner  uf  war 
interrogation  team  and  reconnaissance  reports  from  reliedile  sources. 

Within  each  factor,  the  factor  coefficients  for  each  intelligence 
message  indicate  the  relative  weight  of  the  message  on  the  factor. 

For  exeunple.  Factor  II  loadings  (coefficients)  diminish  when  the  report 
content  deals  with  foot  or  patrol  activity  as  compared  to  convoy  and 
logistic  or  tactical  support  traffic. 

An  inspection  of  Table  5  indicates  that  the  dictionary  successfully 
organized  the  message  san^le  into  a  set  of  factor  groupings  which  are 
logically  coherent  despite  the  relatively  small  size  of  the  test  dic¬ 
tionary  en^loyed.  Each  of  the  factors  of  this  report  structure  represents 
independent  dimensions  of  classification  of  the  total  message  seunple  with 
successive  dimensions  accounting  for  decreasing  amounts  of  the  report 
similarities. 


2 


The  complete  tagging  outputs  for  all  reports  are 


availed}le  from  ARI. 


23 


I 

I 

( 

Table  5 

Summaries  of  tlajor  Category  Tags  for  Word  Index 


AREA  Of  ORE  RATIONS 


OlftCCTION 


Table  4 


Concept  Categories  by 


1. 

North  West 

44. 

2. 

North  East 

45. 

3. 

North  Central  West 

46. 

4. 

North  Central  East 

47. 

5. 

South  Central  West 

48. 

6. 

South  Central  East 

49. 

7. 

South  West 

50. 

8. 

South  East 

51. 

9. 

Zone  North 

52. 

10. 

Zone  North  Central 

53. 

11. 

Zone  South  Central 

54. 

12. 

Zone  South 

55. 

13. 

Coordinates 

56. 

14. 

Zone 

57. 

15. 

Deserters 

58. 

16. 

Enlisted 

59. 

17. 

Officers 

60. 

18. 

POWs 

61. 

19. 

Vehicles 

62. 

20. 

Trains 

63. 

21. 

Decrease 

64. 

22. 

Increase 

65. 

23. 

North 

66. 

24. 

East 

67. 

25. 

South 

68. 

26. 

West 

69. 

27. 

Large 

70. 

28. 

Small 

71. 

29. 

Cognition 

72. 

30. 

Reconnaissance 

73. 

31. 

Army 

74. 

32. 

Army  Air  Corps 

75. 

33. 

Corps 

76. 

34. 

Division 

77. 

35. 

Regiment 

78. 

36. 

Battalion 

79. 

37. 

Company 

80. 

38. 

Headquarters 

81. 

39. 

Team 

82. 

40. 

Enemy 

83. 

41. 

Friendly 

84. 

42. 

Civilian 

85. 

43. 

Military 

86. 

Sequence  Number 

Location 

Terrain 

Urban 

Armor 

Bombers 

Observation 

Auditory 

Visual 

Defense 

Firing 

Flares 

Offense 

Patrols 

Morning 

Afternoon 

Evening 

Aerial 

Surface 

Artillery 

Infantry 

Water 

Artillery  Piece 

Machine  Gun 

Mortars 

Small  arms 

Tanks 

Direction 

Time 

Logistics 

Movement 

Planes 

Transportation 

Organization 

Organization  Status 

Sensor 

Extent 

Change 

Weapons 

Tactics 

Personnel 

Branch 

Intelligence 

Area  of  Operations 


26 


Table  5 

Report  Structure  Using  Concept  Category  Tags 

FACTOR  I 


REPORT 

LOADING 

NOTES 

T  ■  ■  ■ 

31 

.85 

Move  of  352d  Artillery  Regiment  Headquarters 

13 

.84 

Buildup  of  large  enemy  forces  on  western  slope  of  Moselle 
Valley 

16 

.83 

320th  Regiment  in  reserve.  212,  316  &  423  Regiments 
calling  up  reserves 

19 

.78 

Southward  Movement  of  the  116th  Panzer  Division 

14 

.74 

Enemy  patrol  in  vicinity  of  229th  Artillery  Battalion 
Command  Post 

38 

.73 

Panzer  and  infantry  divisions  in  front  of  4th  Infantry 
Division  to  launch  major  offensive 

29 

.66 

Enemy  nap  from  26th  Volksgrenadler  Division  found 

32 

.65 

Enemy  keeping  artillery  well  back  from  front  and  no 
counter  fire  during  past  week 

1 

.64 

352d  Volksgrenadler  Division  located  near  Rlesdorf 

25 

.63 

*Large  formation  of  enemy  infantry  around  Gemuend  preparing 
for  raid  into  110th  sector 

40 

.62 

Considerable  vehicular  activity  all  along  front,  enemy 

observation  planes,  patrols  and  searchlights  reported 

FACTOR  II 


REPORT  LOADING  NOTES 


20 

.82 

Truck  movement  at  coordinates  878698 

3 

.80 

Horse-drawn  vehicles  and  staff  car  between  coordinates  890570 
and  897562 

22 

.75 

Vehicular  movements  vicinity  of  coordinates  854680 

10 

.75 

Horse-drawn  vehicles  vicinity  of  888575  and  930602 

28 

.74 

Vehicles  vicinity  of  880696 

17 

.71 

Tracked  vehicles,  trucks,  motors,  possibly  tanks  between 
coordinates  874693  and  9272 

24 

.69 

Vehicle  movement  at  coordinates  878698  and  vicinity  873687 

37 

.62 

Foot  traffic  vicinity  of  965439,  963436  and  958435 

2 

.58 

Enemy  patrol  movement  in  vicinity  of  851581 

*Fictltlous 

report 

— 

27 


Tabic  5  (Cont'd) 


Report  Structure  Using  Concept  Category  Tags 


FACTOR  III 
REPORT  LOADING 


30 

.92 

36 

.84 

12 

.73 

7 

.72 

8 

.65 

18 

.64 

FACTOR  IV 

REPORT 

LOADING 

23 

.78 

6 

.73 

5 

.72 

21 

.68 

33 

.59 

FACTOR  V 

REPORT 

LOADING 

26 

.71 

35 

.71 

27 

.50 

39 

.40 

*Flctitlou8  reports 


NOTES 


Unusual,  intermittent,  harassing  small  arms  fire  along 
112  sector  front. 

Harked  Increase  of  harassing  small  arms  fire  along  109, 
110,  112th  Regiment  fronts 

Rifle,  pistol  and  flare  activity  along  112th  sector  front 

Enemy  rifle,  pistol  and  mortar  fire  along  northern  sector 
of  4th  Division 

Scattered  rifle,  pistol  mortar,  and  light  artillery  fire 
along  28th  Division  front 

Indiscriminate  rifle  and  pistol  firing,  unusual  motor 
activity  and  aggressive  enemy  patrol  activity  along 
106th  Division  front 


NOTES 


*Enemy  deserter  reports  machine  gun  emplacement  moved  to 
vicinity  850672  protecting  bridge  approach 
*Eneray  deserter  reports  movement  of  15th  Army  to  Cologne- 
Bonn-Dueren  area.  Von  Rundstedt  orders  withdrawal 
*P0W  team  reports  reliable  civilians  state  that  mines 

laid  vicinity  of  Hauler  and  enemy  soldiers  in  Niedergeckler 
Wounded  POW  reports  Headquarters  of  26th  Division  moved  up 
to  Eschfeld-Roscheid  area 

POW  reports  1st  Battalion,  259th  Regiment  In  Ormont  area. 
Listening  post  reports  increase  in  vehicle  movement 


NOTES 


POW  team  reports  highly  reliable  civilian  reports  river 
crossing  equipment  SS  Troops,  artillery  and  vehicles 
moving  West  from  Bitburg 

*POW  team  reports  German  national  reports  marshalling  yards 
at  Zwelbruecken  tied  up  for  six  hours 
Air  Force  reconnaissance  reports  considerable  activity  in 
marshalling  yards  at  Trier 

Abrupt  change  of  routine  on  other  side  of  Sauer  River 
suggests  arrival  of  new  troops 


28 


In  order  to  make  clearer  the  origins  of  these  factors,  examine  the 
character  of  the  two  highest  loading  reports  of  Factor  l!  Reports  31 
and  13.  From  Table  3  note  that  Report  31  contains  word  occurrences  which 
pertain  to  nine  of  the  17  major  categories  of  the  dictionary,  and  that 
Report  13  contains  word  occurrences  pertaining  to  11  of  the  categories. 

The  two  reports  had  word  occurrences  pertaining  to  eight  common  categories: 
TIME,  0RGAN1ZATI(H«,  ORGANIZATIONAL  STATUS,  TACTICS,  PERSONNEL,  BRANCH, 
INTELLIGENCE,  and  AREA  OF  OPERATIONS.  On  the  basis  of  the  number  of 
shared  categories  indicated  by  their  word  entries,  the  two  reports  have 
a  correlation  of  .91,  indicating  a  considerable  similarity  of  content. 

Whatever  structure  emerges  is  obviously  dependent  on  the  adequacy 
of  the  dictionary  which  la  used  to  process  the  reports.  The  factor  group¬ 
ings  of  the  intelligence  reports  are  based  solely  on  the  overlaps  of 
the  concept  categories  which  occur  among  the  reports.  If  the  dictionary 
categories  as  defined  are  not  relevant,  the  ultimate  structure  which  emerges 
will  be  useless.  Additionally,  the  interpretation  which  is  to  be  given 
to  the  structure  must  be  made  in  terms  of  the  categories  which  are  employed. 
Thus,  for  example,  the  reports  grouped  under  FACTOR  II  significantly  share 
content  occurrences  in  the  dictionary  category  of  Transportation.  Factor 
III  reports  moat  prominently  share  the  content  of  categories  Extent, 

Change,  and  Weapons.  Such  a  basis  for  classification  is  eminently  rea¬ 
sonable.  Shared  content  is  a  practical  means  of  establishing  report 
groupings  of  significance  for  intelligence  retrieval  and  analysis  provided 
that  the  content  is  meaningfully  defined.  As  indicated,  however,  the  pro¬ 
cedure  allows  an  analyst  to  redefine  and  change  the  categories  of  the 
dictionary  as  he  chooses  and  as  required  by  the  retrieval  or  organizational 
needs  of  the  intelligence  situation  for  which  the  method  is  to  be  used. 

Factor  I  is  the  most  Important  factor  in  terms  of  amount  of  report 
similarity  for  which  it  accounts.  It  clearly  identifies  tactical  intel¬ 
ligence  of  considerable  significance.  It  successfully  sifts  out  those 
reports  of  lesser  tactical  importance  while  nonetheless  including  reports 
not  having  obvious  tactical  implications  unless  seen  in  the  context  of 
the  other  reports  of  this  factor.  Thus,  for  example,  the  reports  of  a 
small  enemy  patrol  discovered  near  the  229th  Field  Artillery  Battalion 
command  post  takes  on  tactical  significance  when  included  in  the  taxon 
including  reports  of  large-scale  troop  movements  and  enemy  build-ups. 
Similarly,  the  otherwise  negative  report  of  an  absence  of  enemy  artillery 
fire,  especially  counter-battery  fire,  when  seen  in  the  context  of  this 
factor,  implies  an  enemy  stratagem,  although  otherwise  it  could  imply 
(as  was  actually  assumed  in  1944)  an  enemy  ammunition  shortage  and  defen¬ 
sive  posture. 

Factors  I,  II,  and  III  in  decreasing  order  of  importance  identify 
various  kinds  of  enemy  activity  from  troop  movements  and  Immediate  attack 
threats  through  truck  and  convoy  movements  to  small  arms  activity.  Fac¬ 
tors  IV  and  V  pull  together  hearsay  reports  of  varying  degrees  of  relia¬ 
bility.  Three  of  the  seven  false  reports  (Reports  6,  21,  and  23)  added 


29 


to  the  message  set  are  found  with  high  loading  on  Factor  IV.  Of  the 
remaining  four  fictitious  reports,  one  (Report  25)  is  found  with  lower 
loading  on  Factor  I  and  another  (Report  35)  on  Factor  V.  False  reports 
9  and  15  did  not  appear  with  significant  loadings  under  any  of  the  five 
factors.  It  is  reasonable  to  expect  that  deserter,  civilian,  and  POW 
reports  would  be  of  lesser  reliability  than  reconnaissance  or  interroga¬ 
tion  reports,  and  it  is  on  the  basis  of  these  report  sources  that  the 
computer  algoritlun  has  grouped  the  reports. 


CONCLUSIONS  AND  RECOMMENDATIONS 

The  application,  of  a  set  of  procedures  leased  on  the  content  analysis 
of  a  tactical  intelligence  message  set  led  to  the  identification  of  a 
multidimensional  message  structure.  This  logically  coherent  structure 
could  provide  assistance  to  intelligence  analysts  in  the  organization  and 
analysis  of  the  data  in  t)ie  message  set.  However,  the  content-analytic 
procedures  must  be  refined. 

One  clear  inadequacy  of  the  present  dictionary  definition  language, 
as  used  in  producing  the  above  results,  is  the  current  inability  to  define 
number  ranges.  Thus,  all  cloc)c  times,  map  coordinates,  and  other  nunibers 
must  be  represented  as  uniquely  occurring  forms.  Additionally,  the  entire 
General  Inquirer  System  is  programmed  only  for  the  IBM  S/360  or  S/370 
computer.  Apart  from  system  problems,  the  generality  of  the  present 
results  are  limited  by  the  relatively  small  data  base  employed  and  by  the 
use  of  a  specialized  dictionary  based  on  and  adapted  to  that  data  base. 
Nevertheless,  we  believe  the  results  indicate  sufficient  promise  for  this 
Intelligence  data  organizational  scheme  to  warrant  further  investigation. 


30 


References 


Anderberg,  Michael  R.  Cluster  analysis  for  applications.  Air  Force 
Systems  Command,  Kirkland  AFB,  1972.  (AD  738  301). 

Johnson,  S.  C.  Hierarchical  clustering  schemes.  Psychometrica,  1967, 

32,  241-254. 

Miller,  J.  P.,  and  Psathas,  G.  A  1401-1311  tagging  problem  for  the 
General  Inquirer.  Behavioral  Science,  1968,  1^,  166-167. 

Miron,  M.  R. ,  Patten,  S.  M.,  and  Halpln,  S.  M.  Determination  of  the 
Structure  of  Combat  Intelligence  Ratings.  U.S.  Army  Research  Institute 
for  the  Behavioral  and  Social  Sciences,  Technical  Paper  286,  1977. 

Patten,  S.  M.  An  Inductive  taxonomy  of  combat  Intelligence  data.  U.S. 
Army  Research  Institute  for  the  Behavioral  and  Social  Sciences, 

Research  Memorandum  74-14,  December  1974. 

Sneath,  Peter  H.  and  Sokal,  Robert  R.  Numerical  Taxonomy.  San  Francisco: 
W.  H.  Freeman  &  Co.,  1973. 

Stone,  P.  J.,  Bales,  R.  F.,  Namenwirth,  J.  Z.  and  Ogllvie,  D.  M.  The 
General  Inquirer:  A  Coi:q>uter  System  for  Content  Analysis  and 
Retrieval  Based  on  the  Sentence  as  a  Unit  of  Information.  Behavioral 
Sciences ,  1962,  T_,  484-498. 

Stone,  P.  J.,  Dunphy,  D.  C.,  and  Bernstein,  A.  The  Analysis  of  Product 
Image.  In  P.-  J.  Stone  et  (Eds.),  The  General  Inquirer:  A 
Cong)uter  Approach  to  Content  Analysis.  Cambridge,  Mass:  MIT  Press, 
1966,  619-62'7. 

Stone,  P.  J.,  Dunphy,  D.  C.,  Smith,  M.  S.,  and  Ogllvie,  D.  M.,  The  General 
Inquirer:  A  Conguter  Approach  to  Content  Analysis.  Cambridge,  Mass. 

MIT  Press,  1966. 


APPENDIX 


SPECIFICATIONS  FOR  THE  ADAPTATION  OF  THE  GENERAL  INQUIRER  SYSTEM  TO 

ARI  ADP  EQUIPMENT 


Introduction 

The  methodology'  for  Data  Organization  discussed  in  this  report 
is  an  adaptation  of  the  General  Inquirer  System  which  uses  a  Content 
Analysis  Dictionary  to  classify  words  in  a  document  according  to 
concept  names,  or  categories,  as  defined  by  the  user.  Building 
a  new  dictionary  for  documents  in  a  particular  environment  represents 
significant  effort  and  time  but  it  allows  an  individual  analyst  to 
investigate  his  own  theories  by  modifying  existing  files.  The 
proposed  update  routine  should  expedite  any  changes  or  variations 
to  existing  dictionaries  and  is  easily  adapted  to  program  interactive 
terminal  use. 

The  suggested  system  consists  of  several  programs  or  modules  which 
can  be  executed  singly  or  in  sequence  as  specified  by  input  control 
cards  or  keyboard  type-ins.  In  such  a  system,  the  minimum  configura¬ 
tion  would  Include: 

1.  Text  Editor AWIC 

2.  Dictionary  Compiler 

3.  Dictionary  Update 

4.  Tagging  Run 

5.  Retrieval  Edits 

6.  Statistical  Options 

The  system  flow  and  interconnections  are  shown  in  Figure  A-1.  The 
Individual  function  blocks  are  described  below. 


Text  EdltorAWIC 

This  program  scans  the  original  documents  as  entered  at  a  remote 
terminal,  checks  level  identifiers,  punctuation,  flags  numerical  strings, 
and  formats  data  for  writing  on  tape  (or  disc).  Any  serious  errors 
are  to  be  displayed  and  identified  by  document  ID  and  sentence  number. 


The  proRram  makes  a  word  count  as  the  input  is  processed  and 
prints  a  frequency  ordered  listinR  followed  l»y  an  optional  KU'IC 
(Key  Word  in  Context)  in  alphahetical  order,  along  with  the  docu¬ 
ment  identification  field.  Hie  sorting  above  involves  first  the 
frequency  count  and  second  the  alphabetized  appearance  of  each 
word  found  in  the  text. 

To  shorten  the  output,  certain  restrictions  may  also  be  imposed 
to  cause  the  program  to  ignoi-e  common  prepositions,  articles  and 
auxiliary  verbs,  since  the  present  purpose  is  not  to  analyze  syntax. 


Dictionary  Compiler 

The  dictionary  input  (initially  on  cards)  consists  of  three 
sections  in  the  follo»^lng  order: 

Concept  Names  as  defined  by  the  user,  with  their  organization 
and  structure  determined  by  their  starting  position  on  the 
card.  Fach  name  must  be  unique.  Multiplo-^^ord  names  con¬ 
nected  by  hyphens  are  permissible. 

User  tag  definitions  for  precoded  categories. 

Word  definitions  which  sneclfy  the  concept  names  to  be  applied 
and  under  what  conditioivs. 

The  syntax  rules  for  word  definitions  (described  later)  can  be 
fairly  complex  so  this  program  should  be  run  separately  to  ensure 
that  there  are  no  format  errors,  duplicate  concept  names  or  incom¬ 
plete  word  statements.  If  no  serious  errors  are  found,  tlio  compiled 
dictionary  is  written  on  tape  with  an  appropriate  ID  for  each  of 
the  three  sections  mentioned  above. 

Dictionary  Update/Compile 

This  program  provides  for  nwdlficatlons  to  bo  made  to  an  existing 
dictionary  already  on  tape.  The  general  types  of  modification  nj*o 
the  usual  Dr.LT.TE,  ADD,  and  REPLACE  functions  with  the  capoI»lllty  of 
rearranging  the  concept  nave  structure  by  combining  names  under  a 
new  category,  moving  a  name  from  one  level  to  anotlier,  creating  new 
names,  or  indicating  one  of  moiv  names  as  synonymous.  These  func¬ 
tions  eliminate  the  necessity  for  making  many  changes  in  the  concept 
names  as  apecified  in  the  word  descriptiorui. 

After  all  changes  have  been  made,  the  updated  version  is  passed 
to  the  compiler  for  a  new  run. 


35 


Dictionary  Update 

After  an  Initial  taf^^'lng  run,  it  may  become  apparent  that  some 
concept  names  have  low  frequency  counts  and  sliould  Ik?  combined,  or 
that  other  categories  are  not  structured  properly.  To  avoid  making 
changes  to  the  original  card  deck  for  the  dictionary  concepts  and 
word  definitions  we  envision  a  program  to  add,  delete,  or  replace 
entries.  The  input  may  be  punched  on  cards  or  optionally  entered 
at  a  terminal  keyboard. 

Since  the  concept  structui*e  must  be  flexible,  especially  in  the 
early  stages  of  data  organization,  some  possible  functions  contem¬ 
plated  are; 


NAMT.  A,  B,  .  Define  new  category  names. 

ERASE  A,  B,  .....  Delete  names  from  the  table, 

including  any  lower  level  structures. 

RENAME  A  R.  Redefine  A  to  be  R  (new  name).  A 

is  deleted. 

EQUIV  A  B.  Equivalence  A  =  R, 

MOVE  A  R.  Move  A  under  B,  If  no  R  is  specified, 

move  means  to  assign  to  major  level, 

BREAK  A  B,  C......  Split  A  into  new  subcategories  R,  C, 

etc. 

MERGE  A  B,  C,,....  Merge  B,  C,  ..  under  categox*y  A.  B 

and  C  are  erased.  If  B  or  C  =  A 
then  combine  at  level  A.  If  no  B 
or  C  then  merge  all  sublevels. 


Care  muat  be  exercised  in  changing  the  actual  structure  of  concept: 
names.  If  a  name  is  erased,  the  corresponding  word  definitions  may  no 
longer  be  valid  and  require  modification.  Such  modifications  are  also 
possible,  with  certain  restrictions  in  procedural  order.  Deletions  or 
replacements  should  be  made  first,  then  the  additions. 

Since  the  modifications  above  are  being  made  to  a  previously  compiled 
(error  free)  dictionary,  stringent  program  checks  are  needed  to  detect  in- 
coitsistencies  before  the  updated  version  is  recompiled  and  used, 

36 


i  .. 


Tagging  Run 

Input  to  the  tagging  program  cons lata  of  edited  text  and  a 
compiled  dictionary.  The  moat  efficient  method  of  processing  is 
to  store  the  entire  dictionary  in  core  memory  and  process  the  text, 
one  report  or  document  at  a  time.  If  core  memory  is  limited,  the 
run  time  will  be  considerably  longer  and  other  techniques  will  have 
to  be  evaluated. 

Each  word  in  the  text  is  first  matched  against  the  words 
defined  in  the  dictionary.  If  no  match  is  found,  the  word  is 
added  to  the  "left  over"*  list  and  the  next  word  is  processed.  If 
a  match  is  found,  the  concept  name(8)  associated  with  the  definition 
are  appended  to  the  text  word  and  will  appear  in  the  output  listing. 

Since  the  summaries  or  counts  are  kept  for  each  sentence  within  a 
report  or  document,  the  level  identification  and  sentence  number  are 
sufficient  to  identify  each  appearance  of  the  tagged  words.  The 
total  number  of  sentences  per  report  or  document  and  the  number  of 
words  per  sentence  are  also  retained.  Relative  summary  counts  can 
be  calculated  based  on  either  of  these  totals. 

Output  from  this  program  is  a  listing  for  each  document,  each  line 
of  text  with  one  or  more  concept  names  at  all  levels  appearing  directly 
below  the  tagged  words. 

A  summary  for  each  document  appears  at  the  end  with  raw  and  rela¬ 
tive  frequencies  given  for  each  category  in  low  to  high  level  organi¬ 
zation.  These  values  are  also  written  on  tape  with  the  appropriate 
identifiers  and  can  be  used  by  other  modules  such  as  the  Transpose 
and  Statistical  Analysis  programs. 

On  the  first  tagging  run  with  a  new  dictionary,  this  format  pro¬ 
vides  a  complete  reference  for  checking  the  concept  structure  and 
finding  possible  redundant  definitions  in  the  word  specifications. 

Statistical  Options 

From  the  experience  gained  in  the  test  project,  we  recommend  that 
summaries  of  the  total  number  of  words  tagged  in  a  report  or  document 
relative  to  the  number  of  words  in  the  document,  i.e.,  the  word  index 
of  the  Method  section,  be  used  as  the  basic  numerical  index  for  corre¬ 
lational  analysis  of  the  report  patterns.  These  Indices  may  be  employed 
in  any  standard  factor  analysis  routine,  a  FORTRAN  version  being 
appended  to  the  tag  file  output  of  the  content  analysis  routines. 


♦Untagged. 


37 


K«trieval  Tdits 

Although  retrieval  may  not  be  apeciflcally  required*  it  is 
desirable  in  order  to  complete  the  sv-atem.  liie  proposed  s\stem 
would  use  a  request  format  similar  to  the  dictionary  syntax  except 
that  a  different  set  of  action  words  to  describe  the  various  types 
of  output  desired  would  he  employed.  These  control  words  are  LIST* 
CONCLPrS,  and  TAR*  Kach  request  is  treated  separately.  The  pre¬ 
viously  tagged  text  is  scanned  and  a  report  is  constructed  according 
to  the  conditions  satisfied.  . 

LIST  All  sentences. 

CONCKFTS  Words  tagged. 

TAn(n)  Summary  tables  for  level  n. 

The  form  for  a  request  may  be  compound  or  conditional  as  in  the 
wot d  definitions  (minus  the  '*word:’M.  The  program  prints  the  sen¬ 
tences  *  concept  names  and  tagged  words «  or  summary  tables  for  the 
conditions  specified. 

Example:  If  TAG  (n^.nj)  *  OITICER 

THEN  TAR  (2) 


ELSE. 


In  this  context  nj^  and  n-  refer  to  the  word  count  relative  to  the 
beginning  of  the  sentence. 


Input  Conventions 

A  document  can  be  defined  as  a  unit  of  text  containing  one  or  more 
sentences,  «uch  as  a  message  (report),  abstract*  or  paragraph.  Analysis 
is  done  for  each  sentence  so  the  grouping  can  be  arbitrary.  Although 
input  is  basically  free  form,  certain  punctuation  marks  are  to  be 
reserved  for  special  functions  such  as  Identification,  comments,  or 
user  tags. 


Terminal  Punctuation  f.  I  ?  J 

The  end  of  a  phrase  or  sentence  is  defined  by  a  period,  exclamation 
mark,  or  a  question  mark.  Normal  rules  of  punctuation  arc  followed 
except  when  ending  a  quotation  such  as  .  •  .". 


38 


Special  Dellmitei-s  [$[!(()){)*  /] 


% - $ 

1  —  1 


((— )) 


Characters  tietveen  dollar  signs  are  considered  to 
be  document  titles.  They  are  used  as  output  headers 
and  are  not  tagged. 

Identification  codes  are  enclosed  between  left  and 
right  brackets  and,  if  used,  must  appear  at  the 
beginning  of  a  document.  The  level  of  identification 
is  indicated  by  the  number  of  brackets  such  as 
[ [PZ1231  ]  is  the  ID  at  Level  2. 

Double  seta  of  parentheses  can  be  used  for  hand 
coding  a  synonym  foi  the  preceding  word,  which  is 
not  defined  in  the  dictionary.  An  example  might 
be: 


. TIGERS  ((TANKS)) 

where  the  word  TANKS  does  not  appear  in  the  original 
text,  but  is  described  in  the  dictionary..  The  word 
TIGERS  will~be  tagged  as  if  it  were  equivalent  to 
TANKS. 

{ - )  Braces  are  used  to  set  off  comments  or  explanations 

written  by  the  coder  and  are  not  tagged.  Since  these 
characters  do  not  appear  on  a  keypunch  the  convention 
could  be  #(and  #)  to  produce  the  left  and  right  braces. 
For  teletype  or  keyboard  characters  the  equivalents 
might  be  A  and  A  . 

*  An  asterisk  is  the  signal  for  end-of-document  at  level 

one.  Corresponding  to  the  identification  of  levels, 
the  number  of  asterisks  define. the  level  index.  At 
least  one  space  must  precede  each  set  of  asterisks  in 
ascending  order. 

Example:  . the  end.  *  ** 

The  single  *  flags  the  end  of  level  1  and  **  is  the  end 
of  level  2,  where  the  level  index,  low  to  high,  goes 
from  1  to  n  -  10.  This  hierarchy  may  seem  to  converse 
of  formal  ordering  but  makes  it  convenient  to  add  a  new 
level  without  reordering  Identification  of  the  ''Iw^^r" 
level  documents. 

/  A  slash  followed  by  one  or  more  letters  is  inserted  by 

the  coder  to  hand  tag  special  words  or  numbers.  If  a 
.space  immediately  follows  the  /  the  word  itself  is  not 
tagged  and  may  be  followed  by  a  word  Inserted  by  the 
coder. 


39 


Nonnal  Punctuation  [  (  )  :  ;  »  ”  *1 

jingle  acts  of  par^nthcsoa  are  treated  aa  onclosinf;  a  conventional 
parenthetical  phrase.  The  other  characters  are  also  handled  in  the 
usual  sense  and  quote  marks  must  appear  in  pairs.  All  of  the  above 
also  act  as  word  delimiters  (with  or  without  surrounding  spaces  1, 
except  for  the  npastrophe  wh.ch  is  considered  as  part  of  the  word, 

••g* »  "o'clock”  or  "don't." 

Miscellaneous  [  - . ] 

The  hyphen  has  two  possible  uses.  If  used  to  separate  compound 
words  such  as  hALF-RAKfD,  there  must  be  no  spaces  and  the  word  is 
treated  as  a  single  unit.  To  indicate  a  pause  or  break,  one  or  more 
hyphens  may  be  used  if  surrounded  by  blanks. 

t 

A  8,ingle  period  used  as  a  decimal  point  can  be  recognized  as  part 
of  a  decimal  number  with  certain  restrictions.  3.1416  or  3,0  will  be 
treated  as  a  number  string,  but  lO,  will  appear  as  an  end  of  sentence 
marker. 

The  convention  to  indicate  abbreviations  is  flexible,  but  the 
simplest  method,  which  we  recommend,  is  that  the  periods  be  ommitted 
so  as  not  to  be  confused  with  sentence  terminal  punctuation. 


Card  Format 

F^ch  document  should  start  on  a  new  card  in  order  to  facilitate 
listings,  corrections,  or  the  insertion  of  a  new  level  of  identifica¬ 
tion.  The  proposed  scan  progrom  will  not  require  it,  however. 

The  original  text  la  punched  as  if  for  typing,  with  two  exceptions. 

-  The  character  set  is  assumed  to  contain  only  upper  case 
characters,  the  digits  0  through  9,  and  standord  punctuotion 
marks  or  characters  belonging  to  the  ASCII  code*  (limited 
FBCDIC).** 

-  Continuation  cards  are  considered  as  an  extension  (past  column 
80)  of  the  preceding  card.  Words  should  bo  hyphenated  from 
line  to  line  only  if  the  hyphen  is  part  of  the  word.  A  word 
ending  in  column  80  should  be  followed  by  a  card  with  a  blank 
in  column  1. 

One  or  more  blanks,  or  any  of  the  normal  punctuation  marks  are  word 
delimiters.  As  mentioned  above,  the  exceptions  arc  hyphens  within 
word  strings,  a  single  period  in  a  number  string,  and  the  apostrophe. 


♦  American  Standard  Code  for  Information  Interchange  -  a  7  or  8  bit  code 
for  teletypes  Af yboa rds . 

**  rxtended  lUnary-Coded-Declmal  Interchange  Code  (8-nit  Code). 


40 


Special  Cases 


t=  %  0  =]  arc  legal,  but  If  not  set  off  by  spaces  will  be 
considered  as  part  of  a  word. 

[  \t4]  are  characters  which  might  be  used  for  special  controls 
if  data  are  entered  on-line  via  a  teletypewriter  or  keyboard. 

Illegal  codes  [(t  T  L.  |]  will  be  ignored  but  in  any  event  are 
normally  not  available  on  teletypewriters  and  most  terminal  keyboards. 


Content  Analysis  Dictionary 

Specifications  for  the  dictionary  consist  of  three  separate 
sections  -  Concept  Names,  User  Tags,  and  Word  Definitions.  The  firat 
step  in  compiling  a  new  dictionary  would  be  to  run  a  KWIC  on  part  or 
all  of  the  text  and  consider  the  general  groups  of  interest,  the  high 
frequency  words,  and  any  other  phrases  or  words  which  require  special 
consideration  because  of  their  unusual  or  local  context.  In  general, 
articles,  prepositions,  and  auxiliary  verbs  are  of  little  Interest 
and  would  be  specified  as  part  of  a  NOT  table. 

Concept  Names 

The  proposed  adaptation  would  define  the  correspondence  b»’t\^eon 
concept  names  and  tag  numbers  by  the  expression: 

NAME  =  Tag  Numbers. 

This  relationship  can  bo  a  simple  one-to-one  correspondence  or  a  more 
complicated  one  to  describe  major,  minor,  and  one  or  more  subcategories 
in  the  form: 

Major  name  =  tj^ 

Minor  name  =  tj^,  tj. 

Subdivision  =  t, ,  t  ,  t  .  . 

Etc. 

A  concept  name  must  start  wi  th  a  letter  and  contain  only  the 
letters  A  -  Z,  digits  0-9,  and  the  special  character  "hyphen.” 
r.ach  name  must  be  unique  and  should  contain  no  more  than  20 
chni*actei*8 ,  and  have  a  reasonable  limit,  such  as  10,  on  the  levels 
of  categories. 


41 


User  Tags 


In  some  cases  ♦  certain  words  or  numbers  In  the  text  cannot  be 
easily  described  by  word  definitions  and  yet  the  human  editor  or 
coder  wants  to  mark  these  as  belonging  to  a  specific  category.  An 
example  from  the  messages  might  be  to  tag  all  map  coordinates  with 
a  C|  such  as: 

found  in  the  vicinity  of  854584 /C.” 

To  equate  the  C  to  the  desired  concept  name,  already  appearing  in  the 
level  organization  under  Concept  Names,  Use: 

C  =  COORDINATES. 

Hand  coded  tags  must  be  single  letters  but  more  than  one  may  be 
applied  to  a  single  word  such  as: 

. vicinity  of  854584/CR. 

where  C  is  as  above  and  R  might  be  assigned  to  indicate  an  area  east  of 
the  river  such  as: 

R  =  RIVER-EAST. 

If  no  letter  Immedlatoly  follows  a  ”word/**  the  tagging  of  that  word 
will  be  inhibited.  Such  instances  are  usually  followed  by  a  coder's 
Insert  as  in  the  following  example: 

.  of  automatic/  ((pistol))  or  machine  gun  fire. 

The  word  pistol  will  be  tagged  according  to  the  category  assignment 
in  the  dictionary  hut  the  word  "automatic"  will  not. 

Dictionary  Words 

Word  definitions  or  statements  comprise  the  major  part  of  the 
dictionary.  They  do  not  contain  the  conventional  synonyms,  uses  and 
explanations  but  are  declarations  W:  ich  specify  to  which  category  the 
word  belongs,  and  the  rules  or  conditions  for  classification.  The 
general  foi*m  la:  _ 

Word:  Operation  String. 

Since  words  in  the  text  are  subject  to  "chopping",  common  plural  forms 
for  nouns  and  tense  forms  for  regular  verbs  need  not  be  entered.  The 
algorithm  used  will  bo  described  later. 

The  operation  string  imy  be  simple,  compound  or  complex.  The 
simplest  form  is: 

Word:  Concept  Name. 

Example:  SNOl'?:  WEATHER. 

42 


A  compound  expression  is  a  series  of  Concept  Karnes  separated  hy 
semi-colons  f;)  and  ending  with  a  period  (.), 

Example;  SNCKV:  VISIRILITY;  TRAmCAIULlTV;  PRECiriTATlON. 

A  complex  string  uses  key  words  to  describe  conditional  testa 
for  classification  assignment  in  the  form: 

Word:  If  Conditional  expression 
THEN  Clause  A. 
r.LSr.  Clause  H. 

The  interpretation  is  as  follows;  IF  the  conditions  in  the  expression 
are  true  (i.e.,  match  equally  or  produce  a  logically  true  result), 

TlIF.N  process  Clause  A;  F.I.SF.  (otherwise)  process  Clause  II,  Clause  A  or 
Clause  .h  may,  in  turn  ho  another  conditional  expression.  The  direc¬ 
tives  IF,  TlIF.N  and  ELSE  are  reserved  for  use  as  control  words  and  may 
not  appear  ns  Concept  Names, 

Another  key  word  Is  EXIT  which  signals  the  logical  end  of  a 
conditional  path  as  distinguished  from  the  period  at  the  end  of  the 
operation  string.  If  Clause  A  does  not  end  with  the  EXIT  directive 
the  path  drops  through  to  the  end  of  the  ELSE  branch,  unless  It  is  a 
nested  conditional  expression. 

The  control  word  CALL  allows  an  equivalent  definition  to  be  used 
rather  than  having  to  write  the  same  statements  several  times.  Exam¬ 
ples  are  synonyms  or  abbreviations. 

Example:  RGT;  CALL  REGIMENT. 

Options  possible  under  the  IF  expression  are  TAG,  WORD,  CHOP,  and 
two  special  ones,  ID  and  NUMH.  The  form  allows  for  equal,  not  equal, 
and  combinations  of  the  logical  functions  ,  AND,  OR,  NOT. 

Example:  IF  key  (nj^,  n2)  =  a  op 
)^ey  (n^,  Uj)  =  b 
THEN  .  .  . 

where  key  is  one  of  the  options  mentioned  above;  nj^  and  n  define  the 
search  range  in  the  text  relative  to  the  current  word,  n-[orn,,  refer 
to  word  counts  and  the  search  can  be  baclo^ai'd  (negative)  or  forward 
(positive).  If  nj^  and  n2  are  not  specified  the  c.ntire  sentence  is 
searched. 


43 


"Op”  refers  to  one  of  the  logical  teats  whore  the  aymholt  used  are; 

i/t  for  NOT 

«  for  AND 

/  for  OR 

The  teats  are  performed  in  the  order  listed. 

Unless  the  text  has  been  previously  tagged,  "IF  TAP."  in  a  forward 
direction  should  bo  used  with  care  since  ambiguity  is  possible  in 
the  case  of  complex  expressions. 

Example:  PATROL:  IF  WORD  (-3,-1)  =  ENEMY  / 

WORD  (-2,-1)  s  GERMAN 
THEN  ENEMY-UNIT:  EXIT 
ELSE  IF  TAG  =  SECTORA 
THEN  RECON 
ELSE. 

When  the  word  PATROL  or  PATROLS  Is  encountered  in  the  text,  a  search 
is  made  in  the  preceding  three  woi'ds  for  a  match  with  ENF.M\’,  or  in  the 
preceding  two  words  for  a  match  with  GERMAN.  If  either  word  IT"  found, 
then  assign  the  concept  name  ENEW-UNIT  and  exit,  i.e.,  end  of 
^finition. 

If  neither  match  is  made,  then  search  the  entire  sentence  for 
concept  SECTORA.  If  a  match  is  found,  assign  the  tag  RECON  since  SECTORA 
is  known  to  refer  to  friendly  territory. 

"CHOP"  is  similar  to  WORD  except  that  the  match  word  determines 
the  mask  or  number  of  characters  to  be  used  in  the  search. 

Example:  DIRECTION:  IF  CHOP  (-100,  -1)  =  NORTH 

When  the  word  DIRECTION  is  encountered,  a  search  of  the  preceding 
words  is  made,  beginning  at  the  first  word  in  the  sentence  (arbitrarily 
specified  bv  the  -100  range  limiter)  and  will  pick  up  any  of  the 
following:  'north,  NORTHEAST,  NORTHWEST,  NORTHERLY,  NORTHERN,  etc. 

The  other  key  word  "ID"  has  a  slightly  different  interpretation. 

The  parameters  n  and  n-  specify  the  first  and  last  characters  inclu¬ 
sive,  to  bo  checked  in  the  current  ID  field.  An  optional  tlilrd 
parameter  n^  refers  to  the  level  of  identification,  with  a  default 
value  of  1. 


A4 


Example:  HOURS:  IF  ID  (1,2,2)  =  10 


THEN  DEClO 
ELSE. 

The  first  two  characters  In  the  current  ID  field,  level  2,  are 
compared  with  10.  If  a  match  is  found,  assign  the  name  DEClO, 
otherwise  do  nothing. 


Chopping  Algorithm 

To  reduce  the  size  of  the  dictionary  (and  the  work  involved  in 
its  preparation),  the  stragegy  used  in  the  General  Inquirer  is  to 
define  an  algorithm  which  will  "chop”  a  text  word  by  removing  the 
most  common  suffixes  to  find  the  root.  The  corresponding  rules 
for  prefixes  is  much  more  difficult  and  will  net  be  attemp<-ed. 

During  the  tagging  run,  each  word  in  the  text  (  <  20  characters) 
is  first  matched  against  the  words  defined  in  the  dictionary.  If  an 
exact  match  is  found,  the  tag  numbers  are  affixed  to  the  word  and 
saved  for  printing  purposes. 

Ideally,  the  index  for  any  match  on  the  first  four  letters  could 
be  saved  to  eliminate  a  complete  re-scan  in  the  second  search. 

The  next  step  is  to  subject  the  word  to  a  series  of  tests  for 
the  most  common  endings,  double  letters,  and  adverbial  endings  by 
removing  the  letters  *s’,  'Ing',  and  'ed'.  If  none  of  these  trunca¬ 
tions  is  possible  the  word  is  considered  to  be  undefined  and  no 
tagging  is  done.  If  chopping  is  successful,  the  shortened  word  is 
rematched  against  the  dictionary. 

The  obvious  exceptions  are  the  non-standard  forms  of  irregular 
verbs  such  as  come /came  and  some  noun  plurals  such  as  man/men. 

Problem  Areas 

The  General  Inquirer  has  one  serious  shortcoming  -  the  ihablllty 
to  recognize  or  perform  tests  on  numerical  strings.  One  solution  is 
to  provide  for  automatic  tagging  of  ’’words”  beginning  with  any  digit 
0  througli  9.  This  process  would  not  conflict  with  user's  tags  or 
any  qualifiers  in  the  dictionary  word  definitions. 

Internal  tags  would  be  generated  for  the  different  forms  such  as 
30  (rounds),  6-man  (patrol),  and  4th  (Division).  There  is  a  need  to 
recognize  those  distinct  forms  and  also  to  provide  some  means  of 
tagging  them  within  the  proposed  framework. 


The  specific  prohlems  which  appear  In  the  selected  messages  for 
tie  study  were  the  map  coordinates,  4  or  6  digits,  times  in  24  hour 
notation,  and  the  nunK*rlcal  designations  for  military  units.  Docu¬ 
ments  from  another  environment  would  no  doubt  exhibit  other  peculiar¬ 
ities. 

Several  different  approaches  have  been  tried  but  none  so  far  has 
resulted  in  a  format  consistent  with  the  present  system.  The  24 
hour  clock  Introduces  a  modular  concept  and  the  coordinates  arc 
scale  dependent,  Tn  addition,  the  areas  of  interest  are  not 
necessarily  nice  neat  rectangles  and  would  complicate  the  description 
of  given  geographical  regions. 

We  suggest  the  solution  of  introducing  a  fourth  section  to  the 
dictionary  specifications  under  the  heading  "RANGE. "  Each  word  which 
might  have  a  numerical  value  associated  with  it  would  also  have  a 
corresponding  entry  as,  for  example,  with  respect  to  time: 

Example:  RANGE  (HOURS ,3) 

0401-1200:  MORNING. 

1201-2000:  AFTERNOON 
2001-0400:  EVENING 

RANGE  is  a  special  code  word,  HOURS  refers  to  the  word  in  the  dictionary 
and  3  specifies  the  number  of  entries  in  the  table.  MORNING,  AFTERNOON 
and  EtTNING  are  concept  names. 

Corresponding  to  this  declaration  is  an  expanded  expression  to  define 
the  word  "HOURS",  using  two  additional  control  words. 

Example;  HOURS;  IF  COMP  (-5-1) 

THEN  RANGE 
ELSE 

COMP  is  a  special  function  which  will  return  a  true  or  false  result 
after  testing  for  the  presence  of  a  numerical  word  in  the  pi'eceding 
5  words.  If  the  answer  is  true  then  execute  RANOr.  -  meaning  find  the 
table  corresponding  to  HOURS  and  compare  number  with  entries.  If  the 
number  falls  within  the  limits  assign  the  corresponding  concept  name. 

The  RANGE  directive  must  always  be  preceded  by  the  COMP  test,  which 
finds  and  saves  the  pointer  to  the  numerical  string. 


46 


Further  study  is  needed  to  accommodate  other  number  representations 
In  addition  to  the  problems  mentioned  above.  Combinations  of  hand 
tagginiii  character  manipulationt  and  extracting  capabilities  should  all 
he  explored  to  come  up  with  a  generalized  method  rather  than  having 
to  create  additional  functions  for  each  new  application. 


